Learning Decision Lists U sing

نویسنده

  • Oren Etzioni
چکیده

A decision list is an ordered list of conjunctive rules (Rivest 1987). Inductive algorithms such as AQ and CN2 learn decision lists incrementally, one rule at a time. Such algorithms face the rule overlap problem -the classification accuracy of the decision list depends on the overlap between the learned rules. Thus, even though the rules are learned in isolation, they can only be evaluated in concert. Existing algorithms solve this problem by adopting a greedy, iterative structure. Once a rule is learned, the training examples that match the rule are removed from the training set. We propose a novel solution to the problem: composing decision lists from homogeneous rules, rules whose classification accuracy does not change with their position in the decision list. We prove that the problem of finding a maximally accurate decision list can be reduced to the problem of finding maximally accurate homogeneous rules. We report on the performance of our algorithm on data sets from the UCI repository and on the MONK's problems. A simple algorithm for finding a maximal decision list is to exhaustively search the space of decision lists and output the best one found. This algorithm is impractical because the number of decision lists is doublyexponential in the number of attributes. Many existing algorithms ( e.g., (Michalski 1969; Clark and Niblett 1989; Rivest 1987; Pagallo and Haussler 1990)) learn decision lists incrementally by searching the space of conjunctive rules for "good" rules and then combining the rules to form a decision list. Such algorithms face the problem of rule overlap the accuracy of a decision list is not a straightforward function of the accuracy of its constituent rules. To illustrate this point, consider the two rules rl and r2, each having 80% accuracy and 50% coverage on the training examples. The rules may not overlap at all, which yields a two rule decision list with 80% accuracy and 100% coverage. However, the rules may have a 40% overlap in which case the accuracy of the decision list ( rl , r2) could go down to 67% with a coverage of 60%. In general, any algorithm that forms a classifier by combining rules learned separately has to overcome the rule overlap problem. Algorithms such as AQ and CN2 address the overlap problem by adopting an iterative structure. As each rule is learned, it is inserted into the decision list, and the examples covered by the rule are removed from the training set. The algorithm learns the next rule based on the reduced training set. The process is repeated until the training set is exhausted. The overlap problem is addressed by learning each successive rule from a training set where examples that match previously learned rules are filtered out. Note that this iterative approach is greedyonce the algorithm learns a rule, it is committed to keeping that rule in the decision list. All subsequent learning is based on this commitment. While the greedy approach has proven to be effective in practice, it has several problems. First, as pointed out by Clark and Niblett (1989), the interpretation of each rule is dependent on the rules that precede it. This makes decision lists difficult to comprehend because the learned rules cannot be considered in isoIntroduction A decision list is an ordered list of conjunctive rules (Rivest 1987). A decision list classifies examples by assigning to each example the class associated with the first conjunctive rule that matches the example. The decision list induction problem is to identify, from a set of training examples, the decision list that will most accurately classify future examples. A learning algorithm requires some means for predicting how a decision list will perform on future examples. One solution is to use a heuristic scoring function that estimates the accuracy of the list on future examples based on its accuracy on training examples.l The overall induction problem can be decomposed into choosing an appropriate scoring function and finding a decision list that maximizes it. .This research was funded in part by Office of Naval Research grant 92-J-1946 and by National Science Foundation grants IRI-9211045 and IRI-9357772. Richard Segal is supported, in part, by a GTE fellowship. lTo avoid overfitting, additional factors are often included such as the size of the list and the number of training examples covered. Homogeneous Rules Richard Segal and Oren Etzioni* Department of Computer Science and Engineering University of Washington Seattle, WA 98195 { segal, etzioni }@cs. washington.edu lation. Second, on each iteration, fewer training examples are available for the learning algorithm, which hinders the algorithm's ability to learn. This is particularly important in situations where training data is scarce. Finally, poor rule choices at the beginning of the list can significantly reduce the accuracy of the decision list learned. Nevertheless, Rivest showed that a greedy, iterative algorithm can provably PAC learn the concept class k-DL, decision lists composed of rules of length at most k (Rivest 1987). However, Rivest's PAC guarantee presupposes there exist 100% accurate rules of length at most k that cover the training examples. This strong assumption neatly sidesteps the overlap problem because the accuracy of a 100% accurate rule remains unchanged regardless of the rules that precede it in the decision list. However, the assumption is often violated in practice. A full complement of 100% accurate rules of length at most k cannot be found when there is noise in the training data, when the concept to be learned is not in k-DL (relative to the algorithm's attribute language), or when the concept is probabilistic. Our main contribution is a solution to the overlap problem that is both theoretically justified and practical. We borrow the notion of homogeneity from the philosophical literature (Salmon 1984) to solve the overlap problem in learning decision lists. Informally, a homogeneous rule is one whose accuracy does not change with its position in the decision list. Formally, let E denote the universe of examples. Let T denote the set of tests within a domain and G the set of goal classes. Let DL denote the set of all decision lists. Let c( e ) denote the classification of example e, and C ( e , d) denote the classification that decision list d assigns to the example e. We write a rule as A -+ g, where A C T and g E G. When an example e passes all the tests in A, we sayeE A. Let P be a probability distribution over examples. We define the accuracy of a decision list d with respect to P as follows:

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Simple PAC Learning of Simple Decision Lists

We prove that log n-decision lists |the class of decision lists such that all their terms have low Kolmogorov complexity| are learnable in the simple PAC learning model. The proof is based on a transformation from an algorithm based on equivalence queries (found independently by Simon). Then we introduce the class of simple decision lists, and extend our algorithm to show that simple decision l...

متن کامل

Learning Decision Lists by Prepending Inferred Rules

This paper describes a new algorithm for learning decision lists that operates by prepending successive rules to front of the list under construction. This contrasts with the original decision list induction algorithm which operates by appending successive rules to end of the list under construction.. The new algorithm is demonstrated in the majority of cases to produce smaller classifiers that...

متن کامل

Computational Sample Complexity and Attribute-eecient Learning Author to Whom Proofs Should Be Sent

Two fundamental measures of the e ciency of a learning algorithm are its running time and the number of examples it requires (its sample complexity). In this paper we demonstrate that even for simple concept classes, an inherent tradeo can exist between running time and sample complexity. We present a concept class of 1-decision lists and prove that while a computationally unbounded learner can...

متن کامل

Toward Attribute Efficient Learning of Decision Lists and Parities

We consider two well-studied problems regarding attribute efficient learning: learning decision lists and learning parity functions. First, we give an algorithm for learning decision lists of length k over n variables using 2 ) log n examples and time n 1/3). This is the first algorithm for learning decision lists that has both subexponential sample complexity and subexponential running time in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000